-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Fix] Remove incremental logic #97
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-avinash a few questions and change requests before approval.
models/shopify__discounts.sql
Outdated
@@ -1,8 +1,7 @@ | |||
{{ | |||
config( | |||
materialized='table' if target.type in ('bigquery', 'databricks', 'spark') else 'incremental', | |||
materialized='table', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need any of this configuration anymore now that the incremental strategy is being removed? Why do we need to define the cluster and unique_key if there is no incremental strategy leveraging them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question on the other incremental updates in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Removed.
CHANGELOG.md
Outdated
- These models utilized the `merge` incremental strategy on BigQuery and Databricks, as we could not rely on a time series timestamp to impelment the `insert_overwrite` strategy. Using `merge` is a costly strategy, so it defeats the purpose of leveraging incremental logic. | ||
- There were also concerns about the incremental logic returning incorrect data in some end models. For example, if a repeat order within the `new_vs_repeat` CTE logic in `shopify__orders` was calculated within the specified incremental window but the new order was not in that same time period, it could be incorrectly processed as a new order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry this is bordering too much information that customers may not entirely understand this update. Here is a recommendation for making this a bit more succinct.
- These models utilized the `merge` incremental strategy on BigQuery and Databricks, as we could not rely on a time series timestamp to impelment the `insert_overwrite` strategy. Using `merge` is a costly strategy, so it defeats the purpose of leveraging incremental logic. | |
- There were also concerns about the incremental logic returning incorrect data in some end models. For example, if a repeat order within the `new_vs_repeat` CTE logic in `shopify__orders` was calculated within the specified incremental window but the new order was not in that same time period, it could be incorrectly processed as a new order. | |
- Incremental strategies were removed from these models due to potential inaccuracies with the `merge` strategy on BigQuery and Databricks. For instance, the `new_vs_repeat` field in `shopify__orders` could produce incorrect results during incremental runs. To ensure consistency, this logic was removed across all warehouses. If the previous incremental functionality was valuable to you, please consider opening a feature request to revisit this approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
CHANGELOG.md
Outdated
|
||
## [Upstream Under-the-Hood Updates from `shopify_source` Package](https://github.com/fivetran/dbt_shopify_source/releases/tag/v0.15.0) | ||
- (Affects Redshift only) Creates new `shopify_union_data` macro to accommodate Redshift's treatment of empty tables. | ||
- For each staging model, if the source table is not found in any of your schemas, the package will create a empty table with 0 rows for non-Redshift warehouses and a table with 1 all-`null` row for Redshift destinations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads a bit strange. You mention above that this change only impacts Redshift users; however, the first part of this bullet mentions the behavior for every warehouse other than Redshift. Can we switch this around so the Redshift change is mentioned first and then we can say there should be no change in behavior for other warehouses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, as done in the source CHANGELOG.
packages.yml
Outdated
revision: bugfix/redshift-limit-one | ||
warn-unpinned: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to swap before release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-joemarkiewicz Changes applied. This is ready for re-review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but one change needed in the README to bump the version range of the dbt_shopify package. Once that's applied this will be good for release review.
README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package version for this package is still referencing the old v0.15.0 for dbt_shopify. This needs to be updated to reflect the latest version range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-joemarkiewicz Oops. Updated!
CHANGELOG.md
Outdated
- `shopify__order_lines` | ||
- `shopify__orders` | ||
- `shopify__transactions` | ||
- Incremental strategies were removed from these models due to potential inaccuracies with the `merge` strategy on BigQuery and Databricks. For instance, the `new_vs_repeat` field in `shopify__orders` could produce incorrect results during incremental runs. To ensure consistency, this logic was removed across all warehouses. If the previous incremental functionality was valuable to you, please consider opening a feature request to revisit this approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that it was the incremental strategy in general and not limited to merge
, especially since we had the models in question materializing as tables by default for BQ and Databricks, and delete+insert uses the same unique-key logic. Is this not the case? Unless I am mistaken, I would recommend generalizing it a bit:
- Incremental strategies were removed from these models due to potential inaccuracies with the `merge` strategy on BigQuery and Databricks. For instance, the `new_vs_repeat` field in `shopify__orders` could produce incorrect results during incremental runs. To ensure consistency, this logic was removed across all warehouses. If the previous incremental functionality was valuable to you, please consider opening a feature request to revisit this approach. | |
- Incremental strategies were removed from these models due to potential inaccuracies with the incremental strategy. For instance, the `new_vs_repeat` field in `shopify__orders` could produce incorrect results during incremental runs. If the previous incremental functionality was valuable to you, please consider opening a feature request to revisit this approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I wasn't certain, as the majority of the conversation centered around merge
. I've tweaked this a little, let me know if this looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
* MagicBot/add-model-counts updates * Update README.md --------- Co-authored-by: Avinash Kunnath <108772760+fivetran-avinash@users.noreply.github.com>
PR Overview
This PR will address the following Issue/Feature: [#95]
This PR will result in the following new package version: v0.16.0
Source release is breaking, necessitating an upgrade.
Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:
Bug Fixes
shopify__discounts
shopify__order_lines
shopify__orders
shopify__transactions
merge
incremental strategy on BigQuery and Databricks, as we could not rely on a time series timestamp to impelment theinsert_overwrite
strategy. Usingmerge
is a costly strategy, so it defeats the purpose of leveraging incremental logic.new_vs_repeat
CTE logic inshopify__orders
was calculated within the specified incremental window but the new order was not in that same time period, it could be incorrectly processed as a new order.Upstream Under-the-Hood Updates from
shopify_source
Packageshopify_union_data
macro to accommodate Redshift's treatment of empty tables.null
row for Redshift destinations.varchar
. This throws errors in downstream transformations in theshopify
package. The 1 row will ensure that Redshift will respect the package's datatype casts.PR Checklist
Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
Before marking this PR as "ready for review" the following have been applied:
Detailed Validation
Please share any and all of your validation steps:
If you had to summarize this PR in an emoji, which would it be?
🇪🇺